slot value
Hybrid Dialogue State Tracking for Persian Chatbots: A Language Model-Based Approach
Aghabagher, Samin Mahdipour, Momtazi, Saeedeh
Dialogue State Tracking (DST) is an essential element of conversational AI with the objective of deeply understanding the conversation context and leading it toward answering user requests. Due to high demands for open-domain and multi-turn chatbots, the traditional rule-based DST is not efficient enough, since it cannot provide the required adaptability and coherence for human-like experiences in complex conversations. This study proposes a hybrid DST model that utilizes rule-based methods along with language models, including BERT for slot filling and intent detection, XGBoost for intent validation, GPT for DST, and online agents for real-time answer generation. This model is uniquely designed to be evaluated on a comprehensive Persian multi-turn dialogue dataset and demonstrated significantly improved accuracy and coherence over existing methods in Persian-based chatbots. The results demonstrate how effectively a hybrid approach may improve DST capabilities, paving the way for conversational AI systems that are more customized, adaptable, and human-like.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (5 more...)
PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback
Coca, Alexandru, Tseng, Bo-Hsiang, Boothroyd, Pete, Cheng, Jianpeng, Gaynor, Mark, Zhang, Zhenxing, Stacey, Joe, Guigue, Tristan, Alonso, Héctor Martinez, Séaghdha, Diarmuid Ó, Johannsen, Anders
Programmable task-oriented dialogue (TOD) agents enable language models to follow structured dialogue policies, but their effectiveness hinges on accurate state tracking. We present PyTOD, an agent that generates executable code to track dialogue state and uses policy and execution feedback for efficient error correction. To this end, PyTOD employs a simple constrained decoding approach, using a language model instead of grammar rules to follow API schemata. This leads to state-of-the-art state tracking performance on the challenging SGD benchmark. Our experiments show that PyTOD surpasses strong baselines in both accuracy and robust user goal estimation as the dialogue progresses, demonstrating the effectiveness of execution-aware state tracking.
- Asia > Singapore (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (15 more...)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models
Song, Sangmin, Choi, Juhwan, Yun, JungMin, Kim, YoungBin
Large language models (LLMs) have demonstrated remarkable performance in zero-shot dialogue state tracking (DST), reducing the need for task-specific training. However, conventional DST benchmarks primarily focus on structured user-agent conversations, failing to capture the complexities of real-world multi-user interactions. In this study, we assess the robustness of LLMs in multi-user DST while minimizing dataset construction costs. Inspired by recent advances in LLM-based data annotation, we extend an existing DST dataset by generating utterances of a second user based on speech act theory. Our methodology systematically incorporates a second user's utterances into conversations, enabling a controlled evaluation of LLMs in multi-user settings. Experimental results reveal a significant performance drop compared to single-user DST, highlighting the limitations of current LLMs in extracting and tracking dialogue states amidst multiple speakers. Our findings emphasize the need for future research to enhance LLMs for multi-user DST scenarios, paving the way for more realistic and robust DST models.
Factors affecting the in-context learning abilities of LLMs for dialogue state tracking
Hegde, Pradyoth, Kesiraju, Santosh, Švec, Jan, Sedláček, Šimon, Yusuf, Bolaji, Plchot, Oldřich, T, Deepak K, Černocký, Jan
This study explores the application of in-context learning (ICL) to the dialogue state tracking (DST) problem and investigates the factors that influence its effectiveness. We use a sentence embedding based k-nearest neighbour method to retrieve the suitable demonstrations for ICL. The selected demonstrations, along with the test samples, are structured within a template as input to the LLM. We then conduct a systematic study to analyse the impact of factors related to demonstration selection and prompt context on DST performance. This work is conducted using the MultiWoZ2.4 dataset and focuses primarily on the OLMo-7B-instruct, Mistral-7B-Instruct-v0.3, and Llama3.2-3B-Instruct models. Our findings provide several useful insights on in-context learning abilities of LLMs for dialogue state tracking.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Ontario > Toronto (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (8 more...)
A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding
Safa, Abdulfattah, Şahin, Gözde Gül
Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Singapore (0.04)
- North America > Dominican Republic (0.04)
- (12 more...)
CALICO: Conversational Agent Localization via Synthetic Data Generation
Rosenbaum, Andy, Kharazmi, Pegah, Banijamali, Ershad, Zeng, Lu, DiPersio, Christopher, Wei, Pan, Oz, Gokmen, Chung, Clement, Owczarzak, Karolina, Triefenbach, Fabian, Hamza, Wael
We present CALICO, a method to fine-tune Large Language Models (LLMs) to localize conversational agent training data from one language to another. For slots (named entities), CALICO supports three operations: verbatim copy, literal translation, and localization, i.e. generating slot values more appropriate in the target language, such as city and airport names located in countries where the language is spoken. Furthermore, we design an iterative filtering mechanism to discard noisy generated samples, which we show boosts the performance of the downstream conversational agent. To prove the effectiveness of CALICO, we build and release a new human-localized (HL) version of the MultiATIS++ travel information test set in 8 languages. Compared to the original human-translated (HT) version of the test set, we show that our new HL version is more challenging. We also show that CALICO out-performs state-of-the-art LINGUIST (which relies on literal slot translation out of context) both on the HT case, where CALICO generates more accurate slot translations, and on the HL case, where CALICO generates localized slots which are closer to the HL test set.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
- (2 more...)
Zero-shot Slot Filling in the Age of LLMs for Dialogue Systems
Rana, Mansi, Hacioglu, Kadri, Gopalan, Sindhuja, Boothalingam, Maragathamani
Zero-shot slot filling is a well-established subtask of Natural Language Understanding (NLU). However, most existing methods primarily focus on single-turn text data, overlooking the unique complexities of conversational dialogue. Conversational data is highly dynamic, often involving abrupt topic shifts, interruptions, and implicit references that make it difficult to directly apply zero-shot slot filling techniques, even with the remarkable capabilities of large language models (LLMs). This paper addresses these challenges by proposing strategies for automatic data annotation with slot induction and black-box knowledge distillation (KD) from a teacher LLM to a smaller model, outperforming vanilla LLMs on internal datasets by 26% absolute increase in F1 score. Additionally, we introduce an efficient system architecture for call center product settings that surpasses off-the-shelf extractive models by 34% relative F1 score, enabling near real-time inference on dialogue streams with higher accuracy, while preserving low latency.
Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing
Kang, Deokhyung, Hwang, Seonjeong, Kim, Yunsu, Lee, Gary Geunbae
Recent efforts have aimed to utilize multilingual pretrained language models (mPLMs) to extend semantic parsing (SP) across multiple languages without requiring extensive annotations. However, achieving zero-shot cross-lingual transfer for SP remains challenging, leading to a performance gap between source and target languages. In this study, we propose Cross-Lingual Back-Parsing (CBP), a novel data augmentation methodology designed to enhance cross-lingual transfer for SP. Leveraging the representation geometry of the mPLMs, CBP synthesizes target language utterances from source meaning representations. Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings, by utilizing only labeled data in the source language and monolingual corpora. Extensive experiments on two cross-language SP benchmarks (Mschema2QA and Xspider) demonstrate that CBP brings substantial gains in the target language. Further analysis of the synthesized utterances shows that our method successfully generates target language utterances with high slot value alignment rates while preserving semantic integrity. Our codes and data are publicly available at https://github.com/deokhk/CBP.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests
Mannekote, Amogh, Nam, Jinseok, Li, Ziming, Gao, Jian, Boyer, Kristy Elizabeth, Dorr, Bonnie J.
Indirect User Requests (IURs), such as "It's cold in here" instead of "Could you please increase the temperature?" are common in human-human task-oriented dialogue and require world knowledge and pragmatic reasoning from the listener. While large language models (LLMs) can handle these requests effectively, smaller models deployed on virtual assistants often struggle due to resource constraints. Moreover, existing task-oriented dialogue benchmarks lack sufficient examples of complex discourse phenomena such as indirectness. To address this, we propose a set of linguistic criteria along with an LLM-based pipeline for generating realistic IURs to test natural language understanding (NLU) and dialogue state tracking (DST) models before deployment in a new domain. We also release IndirectRequests, a dataset of IURs based on the Schema Guided Dialog (SGD) corpus, as a comparative testbed for evaluating the performance of smaller models in handling indirect requests.
An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants
Bhargav, G P Shrivatsa, Neelam, Sumit, Sharma, Udit, Ikbal, Shajith, Sreedhar, Dheeraj, Karanam, Hima, Joshi, Sachindra, Dhoolia, Pankaj, Garg, Dinesh, Croutwater, Kyle, Qi, Haode, Wayne, Eric, Murdock, J William
We present an approach to build Large Language Model (LLM) based slot-filling system to perform Dialogue State Tracking in conversational assistants serving across a wide variety of industry-grade applications. Key requirements of this system include: 1) usage of smaller-sized models to meet low latency requirements and to enable convenient and cost-effective cloud and customer premise deployments, and 2) zero-shot capabilities to serve across a wide variety of domains, slot types and conversational scenarios. We adopt a fine-tuning approach where a pre-trained LLM is fine-tuned into a slot-filling model using task specific data. The fine-tuning data is prepared carefully to cover a wide variety of slot-filling task scenarios that the model is expected to face across various domains. We give details of the data preparation and model building process. We also give a detailed analysis of the results of our experimental evaluations. Results show that our prescribed approach for slot-filling model building has resulted in 6.9% relative improvement of F1 metric over the best baseline on a realistic benchmark, while at the same time reducing the latency by 57%. More over, the data we prepared has helped improve F1 on an average by 4.2% relative across various slot-types.
- Asia > Singapore (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy (0.04)
- (4 more...)